The goal of this project is to analyze the complex relationships between economic and population growth, sustainable energy practices, and energy consumption
Author
Affiliation
data detectives - Ayesha, Abhishek, Sheemithra, Toluwanimi, Valerie, Alyssa
School of Information, University of Arizona
Abstract
This project utilizes the comprehensive energy dataset from Our World in Data, spanning from 1900 to 2022, to examine the global energy consumption trends regarding economic growth, population dynamics, and the adoption of sustainable energy practices. The primary goal of the project is to design a predictive dashboard that models a nation’s energy consumption based on essential factors such as population size, GDP, and the proportion of electricity derived from renewable sources. The analysis will utilize a range of statistical and machine learning techniques, including time series decomposition, linear regression for key predictors, and regression analysis. We will evaluate the performance of these regression models using R-squared and Root Mean Squared Error (RMSE) metrics to gauge their accuracy and explanatory power. This evaluation is essential for enhancing predictive accuracy and reliability in energy policy formulation and planning. The project will analyze trends in the use of renewable energy at the regional and national levels, with a certain emphasis on emphasizing countries that lead the way in sustainable energy practices and those making progress toward lower greenhouse gas emissions. This analysis will provide crucial insights for industry and researchers dedicated to promoting energy sustainability and promoting economic growth.
Question 1
Is it possible to predict a nation’s power consumption by considering its population size, gross domestic product (GDP), and the percentage of electricity generated from renewable sources and changes across the years?
A right-skewed distribution is shown by the distribution’s shape, which shows that most values are low and that frequency drops off quickly as values rise. In order to train the model to identify patterns and to validate its correctness by contrasting the projected values with the actual data shown in the plot, the historical data represented in this histogram would be crucial.
The relationship between primary energy consumption and the GDP, population size, and the proportion of power derived from renewable sources is represented graphically in this plot. If persistent patterns are seen over time, one might utilize the spread and trends of the scatter points to deduce that there might be correlations between these parameters and a country’s power consumption, which could be used to anticipate energy usage.
Indicating possible relationships between these variables and a country’s power consumption, the graphic shows how the population, GDP, percentage of renewable electricity, and energy consumption have changed over time. For example, if GDP and population growth are accompanied by an upward trend in the primary energy consumption curve, this could indicate that energy demand is driven by economic activity and demographic growth. On the other hand, while a rise in the proportion of power derived from renewable sources may not necessarily translate into reduced energy usage, it may suggest a change in the composition of energy sources. Time periods in which the growth of energy consumption slows down or deviates from trends in GDP and population may be linked to advancements in energy efficiency or structural adjustments in the economy. examining these patterns and connections across time. In order to forecast future power consumption patterns, data demonstrating a high historical association between these variables can be used to develop a predictive model through the analysis of such trends and relationships across time.
Question 2
What countries or regions are engaging in sustainable energy practices and relying more on renewable energy compared to nonrenewable energy? Which countries are moving towards the trajectory of relying more on renewable energy and producing less greenhouse gas emissions?
Data Wrangling for Density Plot
Density plots for renewable and non-renewable energy for the continents
Renewables Consumption Plot
Visualizing the density plots for renewable consumption
Non-renewables consumption Plot
The visualization for the non-renewable consumption of energy
Repo Organization
The following folders comprise the project repository
.github/: This directory is designated for files associated with GitHub, encompassing workflows, actions, and templates tailored for issues.
_extra/: Reserved for miscellaneous files that don’t neatly fit into other project categories, providing a catch-all space for various supplementary documents.
_freeze/: Within this directory lie frozen environment files containing comprehensive information regarding the project’s environment configuration and dependencies.
data/: Specifically allocated for storing i data files crucial for the project’s functionality, encompassing input files, datasets, and other essential data resources.
images/: Serving as a repository for visual assets employed throughout the project, including diagrams, charts, and screenshots, this directory maintains visual elements integral to project documentation and presentation.
.gitignore: This file functions to specify exclusions from version control, ensuring that designated files and directories remain untracked by Git, thus streamlining the versioning process.
README.md: Serving as the primary hub of project information, this README document furnishes essential details encompassing project setup, usage instructions, and an overarching overview of project objectives and scope.
_quarto.yml: Acting as a pivotal configuration file for Quarto, this document encapsulates various settings and options governing the construction and rendering of Quarto documents, facilitating customization and control over document output.
about.qmd: This Quarto Markdown file supplements project documentation by providing additional contextual information, elucidating project purpose, contributor insights, and other pertinent project details.
index.qmd: index.qmd: This serves as the main documentation page for our project. This Quarto Markdown file provides detailed descriptions of our project, including all code and visualization.
Source Code
---title: "Global Energy Trends"subtitle: "INFO 523 - Project Final"author: - name: "data detectives - Ayesha, Abhishek, Sheemithra, Toluwanimi, Valerie, Alyssa" affiliations: - name: "School of Information, University of Arizona"description: "The goal of this project is to analyze the complex relationships between economic and population growth, sustainable energy practices, and energy consumption"format: html: code-tools: true code-overflow: wrap embed-resources: trueeditor: visualexecute: warning: false echo: falsejupyter: python3---## AbstractThis project utilizes the comprehensive energy dataset from Our World in Data, spanning from 1900 to 2022, to examine the global energy consumption trends regarding economic growth, population dynamics, and the adoption of sustainable energy practices. The primary goal of the project is to design a predictive dashboard that models a nation’s energy consumption based on essential factors such as population size, GDP, and the proportion of electricity derived from renewable sources. The analysis will utilize a range of statistical and machine learning techniques, including time series decomposition, linear regression for key predictors, and regression analysis. We will evaluate the performance of these regression models using R-squared and Root Mean Squared Error (RMSE) metrics to gauge their accuracy and explanatory power. This evaluation is essential for enhancing predictive accuracy and reliability in energy policy formulation and planning. The project will analyze trends in the use of renewable energy at the regional and national levels, with a certain emphasis on emphasizing countries that lead the way in sustainable energy practices and those making progress toward lower greenhouse gas emissions. This analysis will provide crucial insights for industry and researchers dedicated to promoting energy sustainability and promoting economic growth.# Question 1Is it possible to predict a nation's power consumption by considering its population size, gross domestic product (GDP), and the percentage of electricity generated from renewable sources and changes across the years?```{python}#| label: question#| echo: false# cell libraryimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport statsmodels.api as smfrom statsmodels.tsa.seasonal import seasonal_decomposefrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_errorimport matplotlib.dates as mdates# Load the datasetdata = pd.read_csv('data/owid-energy-data.csv')#columns to keepkeep = (['year', 'population', 'gdp', 'electricity_generation', 'primary_energy_consumption', 'renewables_electricity'])#data for q1q1_data = data[keep]# Drop rows with any empty valuesq1_data_cleaned = q1_data.dropna()# Save the cleaned dataset to a new CSV fileq1_data_cleaned.to_csv('data/q1_energy_data_cleaned.csv', index =False)``````{python}#| label: question1#| echo: false# Load the clean datasetdata = pd.read_csv('data/q1_energy_data_cleaned.csv')# Calculate the percentage of electricity generated from renewable sourcesdata['renewables_percentage'] = (data['renewables_electricity'] / data['electricity_generation']) *100columns_to_normalize = ['electricity_generation', 'primary_energy_consumption', 'renewables_electricity']for column in columns_to_normalize: data[column] = (data[column] - data[column].min()) / (data[column].max() - data[column].min())# Save the updated dataset to a new CSV filedata.to_csv('data/q1_energy_data_processed.csv', index =False)``````{python}#| label: question2#| echo: false# Load the processed datasetdata = pd.read_csv('data/q1_energy_data_processed.csv')plt.figure(figsize=(10, 8))sns.histplot(data['primary_energy_consumption'], kde =True)plt.title('Distribution of Target Variable "primary_energy_consumption"')plt.xlabel('Primary Energy Consumption')plt.ylabel('Frequency')plt.xlim(0, 0.065) plt.ylim(0, 90) plt.show()```A right-skewed distribution is shown by the distribution's shape, which shows that most values are low and that frequency drops off quickly as values rise. In order to train the model to identify patterns and to validate its correctness by contrasting the projected values with the actual data shown in the plot, the historical data represented in this histogram would be crucial.```{python}#| label: question3#| echo: false# Create a dictionary that maps feature names to desired labelsfeature_labels = {'population': 'Population (in millions)','gdp': 'GDP (in millions)','renewables_electricity': 'Renewable Electricity'}# Set up the plot area to have 1 row and 3 columnsfig, axes = plt.subplots(1, 3, figsize=(12, 6)) for i, column inenumerate(feature_labels):if column in ['population', 'gdp']: sns.scatterplot(ax = axes[i], x = data[column] /1e6, y ='primary_energy_consumption', data = data, color ='green') axes[i].set_xlabel(feature_labels[column]) else: sns.scatterplot(ax = axes[i], x = column, y ='primary_energy_consumption', data = data, color ='green') axes[i].set_xlabel(feature_labels[column]) # Remove the y-axis label for individual plots and y-tick labels for a cleaner look axes[i].set_ylabel('') axes[i].set_yticklabels([]) # Adding a common y-axis label on the left side of the subplotsfig.text(0.00, 0.5, 'Primary Energy Consumption', va ='center', rotation ='vertical', fontsize =9)# Adjust layout and add a common titleplt.tight_layout(pad =3.0, w_pad =2.5, h_pad =2.0)fig.suptitle('Scatter plot of Primary Energy Consumption vs Other Factors', fontsize =12)plt.show()```The relationship between primary energy consumption and the GDP, population size, and the proportion of power derived from renewable sources is represented graphically in this plot. If persistent patterns are seen over time, one might utilize the spread and trends of the scatter points to deduce that there might be correlations between these parameters and a country's power consumption, which could be used to anticipate energy usage.```{python}#| label: question4#| echo: false#| warning: false# Load and preprocess the datadata = pd.read_csv('data/q1_energy_data_processed.csv')data['year'] = pd.to_datetime(data['year'], format='%Y')data.set_index('year', inplace =True)# Create the plot with customized settingsfig, ax1 = plt.subplots(figsize = (12, 6))# Plotting with different line styles and markersax1.plot(data.index, data['population'], label ='Population', color ='blue', linestyle ='-', marker ='o', linewidth =1)ax1.plot(data.index, data['gdp'], label='GDP', color='red', linestyle='--', marker='x', linewidth=1)ax2 = ax1.twinx() ax2.plot(data.index, data['renewables_electricity'], label ='Renewables Electricity', color ='green', linestyle ='-.', marker ='^', linewidth =1, alpha =0.7)ax2.plot(data.index, data['primary_energy_consumption'], label ='Primary Energy Consumption', color ='purple', linestyle =':', marker ='s', linewidth =1, alpha =0.7)# Configure the x-axis with date formattingax1.xaxis.set_major_locator(mdates.YearLocator(3))ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))# Labeling and titlesax1.set_xlabel('Year')ax1.set_ylabel('Population and GDP')ax2.set_ylabel('Renewables and Energy Consumption')ax1.set_title('Time Series of Population, GDP, Renewable Electricity, and Energy Consumption')# Combine legends from both axeslines, labels = ax1.get_legend_handles_labels()lines2, labels2 = ax2.get_legend_handles_labels()ax1.legend(lines + lines2, labels + labels2, loc='upper left')plt.xticks(rotation =45)# Adjust layoutplt.tight_layout(pad =5.0, w_pad =1.5, h_pad =1.0)plt.show()```Indicating possible relationships between these variables and a country's power consumption, the graphic shows how the population, GDP, percentage of renewable electricity, and energy consumption have changed over time. For example, if GDP and population growth are accompanied by an upward trend in the primary energy consumption curve, this could indicate that energy demand is driven by economic activity and demographic growth. On the other hand, while a rise in the proportion of power derived from renewable sources may not necessarily translate into reduced energy usage, it may suggest a change in the composition of energy sources. Time periods in which the growth of energy consumption slows down or deviates from trends in GDP and population may be linked to advancements in energy efficiency or structural adjustments in the economy. examining these patterns and connections across time. In order to forecast future power consumption patterns, data demonstrating a high historical association between these variables can be used to develop a predictive model through the analysis of such trends and relationships across time.# Question 2What countries or regions are engaging in sustainable energy practices and relying more on renewable energy compared to nonrenewable energy? Which countries are moving towards the trajectory of relying more on renewable energy and producing less greenhouse gas emissions?```{python}#| label: MAP1#| echo: false#| warning: falseimport pandas as pdimport geopandas as gpdimport plotly.express as px# Loading datadata = pd.read_csv("data/owid-energy-data.csv")world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))# Merging world shapefile with energy dataworld = world.merge(data, how='left', left_on='iso_a3', right_on='iso_code')# Droping NaN values from columns used in calculationscolumns_to_drop_na = ['solar_consumption', 'wind_consumption', 'hydro_consumption', 'other_renewable_consumption', 'energy_per_capita']# Filtering data from 2000 to 2023world = world[(world['year'] >=2000) & (world['year'] <=2023)]# Grouping by year and country, summing renewable energy consumption and energy per capitaworld = world.groupby(['iso_a3', 'year', 'name']).agg({'solar_consumption': 'sum','wind_consumption': 'sum','hydro_consumption': 'sum','other_renewable_consumption': 'sum','energy_per_capita': 'mean'}).reset_index()# Calculating total renewable energy consumptionworld['total_renewable_consumption'] = world['solar_consumption'] + world['wind_consumption'] + world['hydro_consumption'] + world['other_renewable_consumption']# Calculate renewable energy shareworld['renewable_energy_share'] = (world['total_renewable_consumption'] / world['energy_per_capita']) *100# Setting year column as dateworld['year'] = pd.to_datetime(world['year'], format='%Y')# Plotting the animated mapfig = px.choropleth(world, locations='iso_a3', color='renewable_energy_share', hover_name='name', hover_data={'iso_a3': False, 'renewable_energy_share': True}, animation_frame=world['year'].dt.year, range_color=(0, 7), projection='natural earth', color_continuous_scale=px.colors.sequential.Plasma, title='Share of Renewable Energy Consumption (%)')# Setting x-axis format to display only the yearfig.update_xaxes(dtick='M1', tickformat='%Y')fig.show()``````{python}#| label: BAR5#| echo: false#| warning: falseimport pandas as pdimport geopandas as gpdimport plotly.graph_objects as goworld = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))# Merging world shapefile with energy dataworld = world.merge(data, how='left', left_on='iso_a3', right_on='iso_code')# Droping NaN values from columns used in calculationscolumns_to_drop_na = ['solar_consumption', 'wind_consumption', 'hydro_consumption', 'other_renewable_consumption', 'energy_per_capita']# Filtering data from 2000 to 2023world = world[(world['year'] >=2000) & (world['year'] <=2023)]# Grouping by year and country, summing renewable energy consumption and energy per capitaworld = world.groupby(['iso_a3', 'year', 'name']).agg({'solar_consumption': 'sum','wind_consumption': 'sum','hydro_consumption': 'sum','other_renewable_consumption': 'sum','energy_per_capita': 'mean'}).reset_index()# Calculating total renewable energy consumptionworld['total_renewable_consumption'] = world['solar_consumption'] + world['wind_consumption'] + world['hydro_consumption'] + world['other_renewable_consumption']# Calculate renewable energy shareworld['renewable_energy_share'] = (world['total_renewable_consumption'] / world['energy_per_capita']) *100# Setting year column as dateworld['year'] = pd.to_datetime(world['year'], format='%Y')# Calculate mean renewable energy share by countrycountry_stats = world.groupby('name')['renewable_energy_share'].mean().reset_index()# Identify top 10 countries with the highest mean renewable energy sharetop_10_countries = country_stats.nlargest(10, 'renewable_energy_share')['name'].tolist()# Filter data for top 10 countriestop_10_world = world[world['name'].isin(top_10_countries)]# Sort data by renewable energy sharetop_10_world = top_10_world.sort_values(by='renewable_energy_share', ascending=False)# Plotting the animated bar plot with Plotly Graph Objectsfig = go.Figure()# Create bar traces for each yearfor year, df in top_10_world.groupby('year'): fig.add_trace(go.Bar( x=df['name'], y=df['renewable_energy_share'], name=str(year.year), hoverinfo='x+y', hovertemplate='<b>%{x}</b><br>Renewable Energy Share: %{y:.2f}%<extra></extra>', visible=Falseif year != top_10_world['year'].min() elseTrue, marker=dict(color='rgba(50, 171, 96, 0.6)') ))# Add play button and sliderfig.update_layout( updatemenus=[dict(type="buttons", buttons=[dict(label="Play", method="animate", args=[None, {"frame": {"duration": 500, "redraw": True}, "fromcurrent": True, "transition": {"duration": 300, "easing": "quadratic-in-out"}}] )] )], title='Top 10 Countries with Highest Mean Renewable Energy Share', xaxis=dict(title='Country'), yaxis=dict(title='Mean Renewable Energy Share (%)', range=[0, 20]))# Set initial layoutfig.update_layout(showlegend=True,)# Create frames for each yearframes = [go.Frame( data=[go.Bar( x=df['name'], y=df['renewable_energy_share'], name=str(year.year), hoverinfo='x+y', hovertemplate='<b>%{x}</b><br>Renewable Energy Share: %{y:.2f}%<extra></extra>', marker=dict(color='rgba(50, 171, 96, 0.6)') )], name=str(year.year)) for year, df in top_10_world.groupby('year')]# Add frames to the figurefig.frames = framesfig.show()``````{python}world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))# Merging world shapefile with energy dataworld = world.merge(data, how='left', left_on='iso_a3', right_on='iso_code')# Droping NaN values from columns used in calculationscolumns_to_drop_na = ['solar_consumption', 'wind_consumption', 'hydro_consumption', 'other_renewable_consumption', 'energy_per_capita']world = world.dropna(subset=columns_to_drop_na)# Filtering data from 2000 to 2023world = world[(world['year'] >=2000) & (world['year'] <=2023)]# Grouping by year and country, summing renewable energy consumption and energy per capitaworld = world.groupby(['iso_a3', 'year', 'name']).agg({'solar_consumption': 'sum','wind_consumption': 'sum','hydro_consumption': 'sum','other_renewable_consumption': 'sum','energy_per_capita': 'mean','greenhouse_gas_emissions': 'mean'}).reset_index()# Calculating total renewable energy consumptionworld['total_renewable_consumption'] = world['solar_consumption'] + world['wind_consumption'] + world['hydro_consumption'] + world['other_renewable_consumption']# Calculating renewable energy shareworld['renewable_energy_share'] = (world['total_renewable_consumption'] / world['energy_per_capita']) *100# Setting year column as dateworld['year'] = pd.to_datetime(world['year'], format='%Y')# Calculating mean renewable energy share and greenhouse gas emissions by countrycountry_stats = world.groupby('name').agg({'renewable_energy_share': 'mean','greenhouse_gas_emissions': 'mean'}).reset_index()# Identifing top countries with the highest mean renewable energy sharetop_renewable_countries = country_stats.nlargest(10, 'renewable_energy_share')# Identifing top countries with the lowest mean greenhouse gas emissionstop_low_emissions_countries = country_stats.nsmallest(10, 'greenhouse_gas_emissions')# Creating bar plotsfig_renewable = px.bar(top_renewable_countries, x='name', y='renewable_energy_share', labels={'renewable_energy_share': 'Mean Renewable Energy Share (%)'}, title='Top 10 Countries with Highest Mean Renewable Energy Share')fig_emissions = px.bar(top_low_emissions_countries, x='name', y='greenhouse_gas_emissions', labels={'greenhouse_gas_emissions': 'Mean Greenhouse Gas Emissions'}, title='Top 10 Countries with Lowest Mean Greenhouse Gas Emissions')# Showing plotsfig_emissions.show()``````{python}# Create renewable energy dataset as a subset of the datawind_data = data[['country', 'year', 'wind_consumption']]solar_data = data[['country', 'year', 'solar_consumption']]hydro_data = data[['country', 'year', 'hydro_consumption']]# remove all missing valueswind_data_clean = wind_data.dropna()solar_data_clean = solar_data.dropna()hydro_data_clean = hydro_data.dropna()# Group the data by year and sum the wind consumption for each year across all countriesgrouped_wind_data = wind_data_clean.groupby('year')['wind_consumption'].sum()grouped_solar_data = solar_data_clean.groupby('year')['solar_consumption'].sum()grouped_hydro_data = hydro_data_clean.groupby('year')['hydro_consumption'].sum()# Plot each energy type with appropriate labels and colorsplt.figure(figsize=(12, 10))plt.plot(grouped_wind_data.index, grouped_wind_data.values, marker='o', color='skyblue', label='Wind')plt.plot(grouped_solar_data.index, grouped_solar_data.values, marker='o', color='goldenrod', label='Solar')plt.plot(grouped_hydro_data.index, grouped_hydro_data.values, marker='o', color='seagreen', label='Hydro')plt.xlabel('Year')plt.ylabel('Energy Consumption (in terawatt-hours)') plt.title('Global Renewable Energy Consumption') plt.grid(True)plt.legend() # Add a legend to differentiate the linesplt.tight_layout() # Adjust the plot to ensure everything fits without overlapplt.show()``````{python}import matplotlib.pyplot as plt```### Data Wrangling for Density PlotDensity plots for renewable and non-renewable energy for the continents```{python}#| label: data_wrang_density#| message: false#| warning: false## loading the data setimport pandas as pdenergy = pd.read_csv('data/owid-energy-data.csv') ## performing data wrangling operations on the data set to make it suitable for the analysis## selecting the necessary columns for this analysisenergy2 = energy[['year','country','iso_code', 'biofuel_consumption', 'coal_consumption', 'fossil_fuel_consumption', 'gas_consumption', 'hydro_consumption', 'nuclear_consumption', 'oil_consumption', 'other_renewable_consumption', 'renewables_consumption', 'solar_consumption', 'wind_consumption']]### removing the rows with only missing values in it with the exception of the year columnenergy2_cleaned = energy2.dropna(subset = energy2.columns.difference(['year']), how ='all')### we remove the rows in the country column that are not countries but rather regions or continents### loading the data set for the country codecountry_codes = pd.read_csv('data/country_codes.csv')# filtering the energy data set by removing regions which are not countries from the country column in the data set## first we select the valid country codesvalid_country_codes =set(country_codes['Alpha-3 code'])## filtering the country column in the data setenergy_filtered = energy2_cleaned[energy2_cleaned['iso_code'].isin(valid_country_codes)]###removing all rows with NA values except the iso_codes, year and country columns in the data setenergy_wrangled = energy_filtered.dropna(subset = energy_filtered.columns.difference(['year', 'iso_code', 'country']), how ='all')### removing the values with na values in the data setenergy_wrangled = energy_wrangled.fillna(0)# Select columns to check for 0 valuescolumns_to_check = energy_wrangled.columns.difference(['year', 'iso_code', 'country'])# Filter rows with 0 values in all columns except 'year', 'country', and 'iso_code'energy_wrangled = energy_wrangled[(energy_wrangled[columns_to_check] !=0).any(axis =1)]### The next step is feature engineering where new columns in the data set are created## note that there is a column for the renewable energy consumption total### creating a column for the total consumption of non-renewable energyenergy_wrangled['non_renewables_consumption'] = energy_wrangled['coal_consumption'] + energy_wrangled['fossil_fuel_consumption'] + energy_wrangled['gas_consumption'] + energy_wrangled['nuclear_consumption'] + energy_wrangled['oil_consumption']### rounding the rows in the newly column to 3 decimal placesenergy_wrangled['non_renewables_consumption'] = energy_wrangled['non_renewables_consumption'].round(3)### a new column for the total consumption is created energy_wrangled['total_consumption'] = energy_wrangled['non_renewables_consumption'] + energy_wrangled['renewables_consumption']#### making sure that the total consumption is in two decimal places energy_wrangled['total_consumption'] = energy_wrangled['total_consumption'].round(3)### the next step is to feature engineer and group the data into continents### generating a function that will group the countries into their respective continentsdef group_countries_by_continent(iso_code): continents = {'Africa': ['DZA', 'AGO', 'BEN', 'BWA', 'BFA', 'BDI', 'CPV', 'CMR', 'CAF', 'TCD', 'COM', 'COG', 'COD', 'DJI', 'EGY', 'GNQ', 'ERI', 'SWZ', 'ETH', 'GAB', 'GMB', 'GHA', 'GIN', 'GNB', 'CIV', 'KEN', 'LSO', 'LBR', 'LBY', 'MDG', 'MWI', 'MLI', 'MRT', 'MUS', 'MAR', 'MOZ', 'NAM', 'NER', 'NGA', 'RWA', 'STP', 'SEN', 'SYC', 'SLE', 'SOM', 'ZAF', 'SSD', 'SDN', 'TZA', 'TGO', 'TUN', 'UGA', 'ZMB', 'ZWE'],'Asia': ['AFG', 'ARM', 'AZE', 'BHR', 'BGD', 'BTN', 'BRN', 'KHM', 'CHN', 'CYP', 'GEO', 'IND', 'IDN', 'IRN', 'IRQ', 'ISR', 'JPN', 'JOR', 'KAZ', 'KWT', 'KGZ', 'LAO', 'LBN', 'MYS', 'MDV', 'MNG', 'MMR', 'NPL', 'PRK', 'OMN', 'PAK', 'PSE', 'PHL', 'QAT', 'SAU', 'SGP', 'KOR', 'LKA', 'SYR', 'TWN', 'TJK', 'THA', 'TLS', 'TKM', 'ARE', 'UZB', 'VNM', 'YEM', 'TUR', 'THA'],'Europe': ['ALB', 'AND', 'AUT', 'BLR', 'BEL', 'BIH', 'BGR', 'HRV', 'CZE', 'DNK', 'EST', 'FIN', 'FRA', 'DEU', 'GRC', 'HUN', 'ISL', 'IRL', 'ITA', 'XKX', 'LVA', 'LIE', 'LTU', 'LUX', 'MLT', 'MDA', 'MCO', 'MNE', 'NLD', 'MKD', 'NOR', 'POL', 'PRT', 'ROU', 'RUS', 'SMR', 'SRB', 'SVK', 'SVN', 'ESP', 'SWE', 'CHE', 'UKR', 'GBR', 'VAT'],'North America': ['ATG', 'BHS', 'BRB', 'BLZ', 'CAN', 'CRI', 'CUB', 'DMA', 'DOM', 'SLV', 'GRL', 'GRD', 'GTM', 'HTI', 'HND', 'JAM', 'MEX', 'NIC', 'PAN', 'PRI', 'KNA', 'LCA', 'VCT', 'TTO', 'USA'],'South America': ['ARG', 'BOL', 'BRA', 'CHL', 'COL', 'ECU', 'FLK', 'GUF', 'GUY', 'PRY', 'PER', 'SUR', 'URY', 'VEN'],'Antarctica': ['ATA'],'Europe': ['ALB', 'AND', 'AUT', 'BLR', 'BEL', 'BIH', 'BGR', 'HRV', 'CZE', 'DNK', 'EST', 'FIN', 'FRA', 'DEU', 'GRC', 'HUN', 'ISL', 'IRL', 'ITA', 'XKX', 'LVA', 'LIE', 'LTU', 'LUX', 'MLT', 'MDA', 'MCO', 'MNE', 'NLD', 'MKD', 'NOR', 'POL', 'PRT', 'ROU', 'RUS', 'SMR', 'SRB', 'SVK', 'SVN', 'ESP', 'SWE', 'CHE', 'UKR', 'GBR', 'VAT'],'Australia': ['AUS', 'NZL', 'PNG', 'FJI', 'SLB', 'VUT', 'NZL'] }for continent, countries in continents.items():if iso_code in countries:return continentreturnNone# Return None if ISO code not found in any continent# Apply the function to create the 'continents' column in energy_wrangled DataFrameenergy_wrangled['continents'] = energy_wrangled['iso_code'].apply(group_countries_by_continent)### removing hong kong since it is not a country from the data setenergy_wrangled = energy_wrangled[energy_wrangled['iso_code'] !='HKG']```### Renewables Consumption PlotVisualizing the density plots for renewable consumption```{python}#| label: density1_redone#| output: falseimport matplotlib.pyplot as pltimport seaborn as snsfrom matplotlib.animation import FuncAnimationfrom matplotlib.ticker import FuncFormatterfrom IPython.display import HTML# Extract unique years from the 'year' columnyears = energy_wrangled['year'].unique()# Function to format the x-axis labels with commasdef format_with_commas(value, pos):return"{:,}".format(int(value))# Function to update the plot for the selected yeardef update_plot(year):# Filter data for the selected year data_year = energy_wrangled[energy_wrangled['year'] == year]# Clear the previous plot plt.clf()# Plot the KDE plot for the selected year sns.kdeplot(data=data_year, x='renewables_consumption', hue='continents', multiple='stack', alpha=0.3, linewidth=0.2)# Set title plt.title(f'Density Plot of Renewables Consumption by Continent for Year {year}')# Set the x label plt.xlabel('Renewable Energy Consumption')# Set the x-axis tick formatter plt.gca().xaxis.set_major_formatter(FuncFormatter(format_with_commas))# Initialize the plot with the first yearupdate_plot(years[0])# Create a FuncAnimation object without controls (slider, play/pause button)ani2 = FuncAnimation(plt.gcf(), update_plot, frames=years, interval=1000)# Convert the animation to HTML formathtml_animation2 = ani2.to_jshtml()``````{python}#| label: hmtl_animation2HTML(html_animation2)```### Non-renewables consumption PlotThe visualization for the non-renewable consumption of energy```{python}#| label: density2_redone#| output: falseimport matplotlib.pyplot as pltimport seaborn as snsfrom matplotlib.animation import FuncAnimationfrom matplotlib.ticker import FuncFormatter# Extract unique years from the 'year' columnyears = energy_wrangled['year'].unique()# Function to format the x-axis labels with commasdef format_with_commas(value, pos):return"{:,}".format(int(value))# Function to update the plot for the selected yeardef update_plot(year):# Filter data for the selected year data_year = energy_wrangled[energy_wrangled['year'] == year]# Clear the previous plot plt.clf()# Plot the KDE plot for the selected year sns.kdeplot(data=data_year, x='non_renewables_consumption', hue='continents', multiple='stack', alpha=0.3, linewidth=0.2)# Set title plt.title(f'Density Plot of Non-Renewables Consumption by Continent for Year {year}')# Set the x label plt.xlabel('Non-Renewable Energy Consumption')# Set the x-axis tick formatter plt.gca().xaxis.set_major_formatter(FuncFormatter(format_with_commas))# Initialize the plot with the first yearupdate_plot(years[0])# Create a FuncAnimation object without controls (slider, play/pause button)ani = FuncAnimation(plt.gcf(), update_plot, frames=years, interval=1000)# Convert the animation to HTML formathtml_animation = ani.to_jshtml()# Display the HTML animationHTML(html_animation)``````{python}#| label: html_animationHTML(html_animation)```# Repo OrganizationThe following folders comprise the project repository- **.github/:** This directory is designated for files associated with GitHub, encompassing workflows, actions, and templates tailored for issues.- **\_extra/:** Reserved for miscellaneous files that don't neatly fit into other project categories, providing a catch-all space for various supplementary documents.- **\_freeze/:** Within this directory lie frozen environment files containing comprehensive information regarding the project's environment configuration and dependencies.- **data/:** Specifically allocated for storing i data files crucial for the project's functionality, encompassing input files, datasets, and other essential data resources.- **images/:** Serving as a repository for visual assets employed throughout the project, including diagrams, charts, and screenshots, this directory maintains visual elements integral to project documentation and presentation.- **.gitignore:** This file functions to specify exclusions from version control, ensuring that designated files and directories remain untracked by Git, thus streamlining the versioning process.- **README.md:** Serving as the primary hub of project information, this README document furnishes essential details encompassing project setup, usage instructions, and an overarching overview of project objectives and scope.- **\_quarto.yml:** Acting as a pivotal configuration file for Quarto, this document encapsulates various settings and options governing the construction and rendering of Quarto documents, facilitating customization and control over document output.- **about.qmd:** This Quarto Markdown file supplements project documentation by providing additional contextual information, elucidating project purpose, contributor insights, and other pertinent project details.- **index.qmd:** index.qmd: This serves as the main documentation page for our project. This Quarto Markdown file provides detailed descriptions of our project, including all code and visualization.